A Lexical Semantic and Statistical Approach to Lexical Collocation Extraction for Natural Language Generation
نویسنده
چکیده
Dissertation Abstract comprise words for which neither word can be substituted by a synonym or hyponym. Potential colloca-tions comprising certain adjacent part-of-speech tags are extracted from text. An online thesaurus and a lexical database of word classes are queried for synonyms and hypo nyms, respectively, for each potential collocation. These queries create potential challenger pairs, such as " strong java " and " powerful coffee. " A substitution procedure is then applied to determine if any of these challenging word pairs occur more frequently than the potential colloca-tion. The VERIFY lexical collocation-extraction system has been implemented incorporating these ideas. Results to date have been positive; a system using lexical-semantic knowledge , that is, synonymy and hyp-onymy, within a lexical collocation-extraction system outperforms a system using purely statistical knowledge. To compare system output to human judgments of training data, a training component was also incorporated into VERIFY. This component is able to adapt to new data. Overall system performance, measured in recall and precision scores, 2 improved using this component. To provide a more flexible system given a user's application, a weighting mechanism was used to produce a range of recall and precision scores. These weights can be adjusted to optimize system performance. Through an experimental procedure , VERIFY was trained on a set of human-analyzed data and then independently tested on another set of L exical collocations are frequently occurring word pairs in natural language whose presence are not always predictable by their usage. These collocations are used by native speakers of a language almost without thought, yet they must be learned by nonnative speakers of the language. A native speaker of English might say that he/she drinks " strong coffee, " but a nonnative speaker might say either " powerful coffee " or " sturdy coffee. " Collocations tend to vary among languages and topic domains. Unfortunately , the task of correctly identifying lexical collocations, even by native speakers of the language, has been shown to be difficult. Computer systems that translate natural languages, or machine-translation systems, need to know about lexical collocation information to produce natural-sounding or colloquially proper text. Natural language generation is a component of a machine-translation system that automatically produces natural-sounding text in a particular language given a language-independent meaning as input. This dissertation (Doerr 1994) 1 demonstrates how to automatically locate and extract lexical collocations from machine-readable text for use within a machine-translation system's …
منابع مشابه
Meaning Representation for Automatic Extraction of Lexical Functions
Resumen. Lexical functions formalize semantic and syntactic relations between lexical units, given that meaning of an individual word largely depends on various relations connecting it to other words in context. Collocational relation is a type of institutionalized lexical relations that holds between the base and its partner in a collocation in contrast to free word combination where both word...
متن کاملThe Comparison of Native English and Persian Elementary School Students’ Performance on Lexical and Grammatical Collocations
The importance and howness of language learning/ acquisition has been a great concern for decades. There are many factors that play important roles in this regard. This research compared the performance of native Persian and English elementary students to see if there is any significant difference between the two groups and which type of collocation they performed better within the groups. For ...
متن کاملCollocations in Multilingual Natural Language Generation: Lexical Functions meet Lexical Functional Grammar
In a collocation, the choice of one lexical item depends on the choice made for another. This poses a problem for simple approaches to lexicalisation in natural language generation systems. In the Meaning-Text framework, recurrent patterns of collocations have been characterised by lexical functions, which offer an elegant way of describing these relationships. Previous work has shown that usin...
متن کاملMedLex+: An Integrated Corpus-Lexicon Medical Workbench for Swedish
This paper reports on the work carried out developing MedLex+, a medical corpuslexicon workbench for Swedish. This project, which is still under active development, has been going on for some years now within the Department of Swedish language at Göteborg University. At the moment, the workbench incorporates: an annotated collection of medical texts-including 20 million tokens and 45,000 docume...
متن کاملThe Effect of Lexical Collocational Density on the Iranian EFL Learners’ Reading Comprehension
The present study aims at investigating the effect of different levels of lexical collocational density on EFL learners’ reading comprehension. Eighty sophomore students with different levels of proficiency studying at Zand Institute of Higher Education in Shiraz, Iran were chosen from among eighty five learners based on their score distribution on a reduced TOEFL test constructed by Education...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AI Magazine
دوره 16 شماره
صفحات -
تاریخ انتشار 1995